{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```json\n",
    "@article{weng2020transformer,\n",
    "  title   = \"The Transformer Family\",\n",
    "  author  = \"Weng, Lilian\",\n",
    "  journal = \"lilianweng.github.io/lil-log\",\n",
    "  year    = \"2020\",\n",
    "  url     = \"https://lilianweng.github.io/lil-log/2020/03/27/the-transformer-family.html\"\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 注意力与自注意力"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注意力：\n",
    "- 神经网络中，模型通过**选择性的关注给定数据集**来进行预测的机制；\n",
    "- 注意力用模型学习到的权重来量化，注意力的输出通常为加权平均\n",
    "\n",
    "自注意力：\n",
    "- 模型对数据样本的一部分进行预测时，会使用同样的样本的其他部分提取出来的信息\n",
    "\n",
    "实现形式：\n",
    "- 多种实现形式\n",
    "- 点积注意力，给定 `query(Q),key(K),value(V)`，\n",
    "$\\text{Attention}(\\mathbf{Q}, \\mathbf{K}, \\mathbf{V}) = \\text{softmax}(\\frac{\\mathbf{Q} {\\mathbf{K}}^\\top}{\\sqrt{d_k}})\\mathbf{V}$\n",
    "    - 对应的权重值：\n",
    "    $$a_{ij} = \\text{softmax}(\\frac{\\mathbf{q}_i {\\mathbf{k}_j}^\\top}{\\sqrt{d_k}})\n",
    "= \\frac{\\exp(\\mathbf{q}_i {\\mathbf{k}_j}^\\top)}{ \\sqrt{d_k} \\sum_{r \\in S_i} \\exp(\\mathbf{q}_i {\\mathbf{k}_r}^\\top) }$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 多头自注意力"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../images/multi_head_attention.png\" width=\"40%\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Transformer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../images/Transformer-architecture.png\" width=\"80%\" align=\"left\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 编码-解码器架构"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 位置编码"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "与输入词嵌入有相同的尺寸，直接与其相加，再输入到编码器或解码器\n",
    "-  `Sinusoidal positional encoding`：\n",
    "    $$\\text{PE}(i,\\delta) = \n",
    "\\begin{cases}\n",
    "\\sin(\\frac{i}{10000^{2\\delta'/d}}) & \\text{if } \\delta = 2\\delta'\\\\\n",
    "\\cos(\\frac{i}{10000^{2\\delta'/d}}) & \\text{if } \\delta = 2\\delta' + 1\\\\\n",
    "\\end{cases}$$\n",
    "- `Learned positional encoding`：位置编码作为参数，每个位置一个向量表示，通过模型学习得到"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 提升性能"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 辅助任务"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "1. 在序列末尾产生一个预测 --> 每个中间位置也被要求产生正确预测，强制模型预测窗口内容\n",
    "2. 每个中间层也会产生预测，每层预测损失有一个权重；层次越高，预测损失对总损失的贡献越大，相应权重越大\n",
    "3. 序列每个位置预测多个目标，如在位置2进行预测：位置3和位置4处的单词各是什么\n",
    "```\n",
    "<img src=\"../images/transformer-aux-losses.png\" width=\"100%\">\n",
    "\n",
    "\n",
    "原始论文：    \n",
    "https://arxiv.org/pdf/1808.04444.pdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 自适应计算时间TODO\n",
    "Adaptive Computation Time (ACT)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 提升注意力的宽度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "普通的Transformer注意力的宽度是固定的长度，模型只能关注同一片段中的信息，不同信息不能再这些定长的片段中流动，因此：\n",
    "- 模型不能捕捉长距离依赖\n",
    "- 无法预测给定片段的前几个标记\n",
    "- 评估时代价昂贵：每次片段向右移一位，新的片段重新被处理，尽快比上一步只多出一个标记"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Transformer-XL 两个主要改进，以解决上下文隔离的问题\n",
    "- 重利用不同片段的隐藏状态\n",
    "- 采用了新的位置编码"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 重利用隐藏状态\n",
    "原始论文：  \n",
    "https://arxiv.org/abs/1901.02860"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../images/transformer-XL-training.png\" width=\"100%\">\n",
    "模型处理片段 $(\\tau + 1)$ 时，第 $n$ 层的隐藏状态 $\\mathbf{h}_{\\tau+1}^{(n)} \\in \\mathbb{R}^{L \\times d}$，不仅依赖于同一片段的上一层隐藏状态 $\\mathbf{h}_{\\tau+1}^{(n-1)} \\in \\mathbb{R}^{L \\times d}$，还依赖于上一片段同一层的隐藏状态 $\\mathbf{h}_{\\tau}^{(n)} \\in \\mathbb{R}^{L \\times d}$\n",
    "\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\color{red}{\\widetilde{\\mathbf{h}}_{\\tau+1}^{(n-1)}} &= [\\text{stop-gradient}(\\mathbf{h}_{\\tau}^{(n-1)}) \\circ \\mathbf{h}_{\\tau+1}^{(n-1)}] \\\\\n",
    "\\mathbf{Q}_{\\tau+1}^{(n)} &= \\mathbf{h}_{\\tau+1}^{(n-1)}\\mathbf{W}^q \\\\\n",
    "\\mathbf{K}_{\\tau+1}^{(n)} &= \\color{red}{\\widetilde{\\mathbf{h}}_{\\tau+1}^{(n-1)}} \\mathbf{W}^k \\\\\n",
    "\\mathbf{V}_{\\tau+1}^{(n)} &= \\color{red}{\\widetilde{\\mathbf{h}}_{\\tau+1}^{(n-1)}} \\mathbf{W}^v \\\\\n",
    "\\mathbf{h}_{\\tau+1}^{(n)} &= \\text{transformer-layer}(\\mathbf{Q}_{\\tau+1}^{(n)}, \\mathbf{K}_{\\tau+1}^{(n)}, \\mathbf{V}_{\\tau+1}^{(n)})\n",
    "\\end{aligned}$$\n",
    "\n",
    "上式中，注意力的 key 和 value 都依赖于扩展的隐藏状态，而 query 只与当前片段的隐藏层状态，扩展隐藏状态时的连接 $[. \\circ .]$ 操作是沿着序列长度反向；通过整合之前片段的隐藏信息，模型将注意力的宽度扩展到了多个片段"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### 相对位置编码"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "新形式的注意力机制，使用普通transfromer的绝对位置编码，会导致不同片段有相同的位置编码；\n",
    "   - 为了适应新形式的注意力，相对位置编码：知道 query 向量$\\mathbf{q}_{\\tau, i}$和key向量$\\mathbf{k}_{\\tau, j}$之间的位置距离 $i-j$，应该就能很好的进行预测\n",
    "     \n",
    "   $$\\begin{aligned}\n",
    "a_{ij} \n",
    "&= \\mathbf{q}_i {\\mathbf{k}_j}^\\top = (\\mathbf{x}_i + \\mathbf{p}_i)\\mathbf{W}^q ((\\mathbf{x}_j + \\mathbf{p}_j)\\mathbf{W}^k)^\\top \\\\\n",
    "&= \\mathbf{x}_i\\mathbf{W}^q {\\mathbf{W}^k}^\\top\\mathbf{x}_j^\\top + \\mathbf{x}_i\\mathbf{W}^q {\\mathbf{W}^k}^\\top\\mathbf{p}_j^\\top + \\mathbf{p}_i\\mathbf{W}^q {\\mathbf{W}^k}^\\top\\mathbf{x}_j^\\top + \\mathbf{p}_i\\mathbf{W}^q {\\mathbf{W}^k}^\\top\\mathbf{p}_j^\\top\n",
    "\\end{aligned}$$\n",
    "    - 将上式重新处理成：\n",
    "    $$a_{ij}^\\text{rel} = \n",
    "\\underbrace{ \\mathbf{x}_i\\mathbf{W}^q \\color{blue}{ {\\mathbf{W}_E^k}^\\top } \\mathbf{x}_j^\\top }_\\text{content-based addressing} + \n",
    "\\underbrace{ \\mathbf{x}_i\\mathbf{W}^q \\color{blue}{ {\\mathbf{W}_R^k}^\\top } \\color{green}{\\mathbf{r}_{i-j}^\\top} }_\\text{content-dependent positional bias} + \n",
    "\\underbrace{ \\color{red}{\\mathbf{u}} \\color{blue}{ {\\mathbf{W}_E^k}^\\top } \\mathbf{x}_j^\\top }_\\text{global content bias} + \n",
    "\\underbrace{ \\color{red}{\\mathbf{v}} \\color{blue}{ {\\mathbf{W}_R^k}^\\top } \\color{green}{\\mathbf{r}_{i-j}^\\top} }_\\text{global positional bias}$$\n",
    "        - 将 $\\mathbf{p}_j$ 替换成相对位置编码 $\\mathbf{r}_{i-j} \\in \\mathbf{R}^{d}$\n",
    "        - 将 $\\mathbf{p}_i\\mathbf{W}^q$ 替换成两个可训练参数$\\mathbf{u}$（用于内容）和$\\mathbf{v}$（用于位置）\n",
    "        - 将 $\\mathbf{W}^k$ 拆分成 $\\mathbf{W}^k_E$ 表示内容信息，$\\mathbf{W}^k_R$表示位置信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 自适应注意力宽度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Transformer 的一个优点是能捕捉长距离依赖。   \n",
    "- 但是不同的输入片段，模型可能注意到不同的宽度的信息；或者一个注意力头可能和其它注意力头有不同的注意力模式   \n",
    "\n",
    "如果模型能根据需要，灵活调整注意力宽度，可以在减少计算量和内存消耗时，支持更大的上下文长度   \n",
    "原始连接：https://arxiv.org/abs/1905.07799\n",
    "- 同一上下文片段，不同的注意力头获得不同的注意力权重分布，因此不同的注意力头的最优宽度应该分开训练"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "普通注意力计算过程\n",
    "$$\n",
    "\\begin{aligned}\n",
    "e_{ij} &= \\mathbf{q}_i {\\mathbf{k}_j}^\\top \\\\ \n",
    "a_{ij} &= \\text{softmax}(e_{ij}) = \\frac{\\exp(e_{ij})}{\\sum_{r=i-s}^{i-1} \\exp(e_{ir})} \\\\\n",
    "\\mathbf{y}_i &= \\sum_{r=i-s}^{i-1}a_{ir}\\mathbf{v}_r = \\sum_{r=i-s}^{i-1}a_{ir}\\mathbf{x}_r\\mathbf{W}^v\n",
    "\\end{aligned}\n",
    "$$\n",
    "- query向量 $\\mathbf{q}_i$ ，key向量 $\\mathbf{k}_j,j \\in S$ ，value向量 $\\mathbf{v}_j,j \\in S$ ，$S$ 为query向量做注意力的上下文，其宽度即为注意力的宽度\n",
    "\n",
    "   \n",
    "自适应注意力：     \n",
    "- 添加mask函数控制注意力的宽度，将query和key之间的距离映射为 $[0,1]$ 间的值  \n",
    "$$m_z(x) = \\text{clamp}(\\frac{1}{R}(R+z-x), 0, 1)$$\n",
    "其中 z 即为要学习的参数， R 为超参数决定mask的软硬程度\n",
    "<img src=\"../images/soft-masking-function.png\" width=\"60%\">\n",
    "- 注意力计算变为：\n",
    "$$a_{ij} = \\frac{m_z(i-j)\\exp(s_{ij})}{\\sum_{r=i-s}^{i-1}m_z(i-r) \\exp(s_{ir})}$$\n",
    "上述方程中，$z$ 是可微的，因此和其它参数一起训练；每个头的参数 $z^{(i)}, i=1, \\dots, h$ 各自进行训练，且损失受到额外的正则化限制：$\\sum_{i=1}^h z^{(i)}$\n",
    "\n",
    "- 自适应注意力宽度发现一般的趋势：较低的模型层不需要较长的宽度，高层的模型层的少年注意力头可能需要非常长的宽度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 本地注意力（Image Transformer）TODO\n",
    "Localized Attention Span "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 减少时间和内存消耗"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 稀疏注意力矩阵分解 Sparse Attention Matrix Factorization (Sparse Transformers)   \n",
    "普通Transformer的时间和空间消耗随着序列长度二次方增长\n",
    "- 因此很难用于长序列 \n",
    "\n",
    "  \n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\text{Attend}(\\mathbf{X}, \\mathcal{S}) &= \\Big( a(\\mathbf{x}_i, S_i) \\Big)_{i \\in \\{1, \\dots, L\\}} \\\\\n",
    "\\text{ where } a(\\mathbf{x}_i, S_i) &= \\text{softmax}\\Big(\\frac{(\\mathbf{x}_i \\mathbf{W}^q)(\\mathbf{x}_j \\mathbf{W}^k)_{j \\in S_i}^\\top}{\\sqrt{d_k}}\\Big) (\\mathbf{x}_j \\mathbf{W}^v)_{j \\in S_i}\n",
    "\\end{aligned}\n",
    "$$\n",
    "- 上式中，$\\mathcal{S} = \\{S_1, \\dots, S_n\\}$，$S_i$ 表示第 $i$ 个qeury向量做注意力的key向量的位置范围，尽管$S_i$的尺寸不是固定的，$\\mathbf{x}_j \\mathbf{W}^v$ 表示value向量，但最终的输出尺寸是固定的 $\\text{Attend}(\\mathbf{X}, \\mathcal{S}) \\in \\mathbb{R}^{L \\times d_v}$\n",
    "- 在auto-regressive模型中，每个标记对其之间的所有标记进行注意力操作 $S_i = \\{j: j \\leq i\\}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sparse Transformer，原始链接：https://arxiv.org/abs/1904.10509  。   \n",
    "稀疏注意力分解，将集合$S_i$拆分成一个依赖树；每一对$(i,j),j \\leq i$ 都有一个路径链接两者；具体而言\n",
    "将集合 $S_i$ 拆分成不重叠的 $p$ 个子集 $A^{(m)}_i \\subset S_i, m = 1,\\dots, p$\n",
    "<img src=\"../images/sparse-attention.png\" width=\"100%\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-22T02:26:43.241548Z",
     "start_time": "2020-05-22T02:26:43.233332Z"
    }
   },
   "source": [
    "### Locality-Sensitive Hashing (Reformer)\n",
    "论文链接：https://arxiv.org/abs/2001.04451"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Transformer 的痛点：\n",
    "- N层模型的空间消耗，比单层的大N倍\n",
    "- 中间前向层非常大\n",
    "- 长$L$的序列对应的注意力矩阵，将需要 $O(L^2)$ 的空间和时间消耗"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Reformer的改进：\n",
    "- 点积注意力替换成 *`locality-sensitive hashing (LSH) attention`*，将复杂度从 $O(L^2)$ 降低到 $O(L\\log L)$\n",
    "- 将标准的残差模块替换成 *`reversible residual layers`*，只储存激活一次而不是N词"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Locality-Sensitive Hashing Attention\n",
    "注意力权重分布中 $\\mathbf{Q} \\mathbf{K}^\\top$ ，其实只关注较大的几个元素；即对 $q_i$ 需要从key矩阵 $K$ 中找到与$q_i$ 最接近的几个；使用 *`Locality-Sensitive Hashing`* 技术实现：  \n",
    "  \n",
    "$$h(x) = \\arg\\max([xR; −xR])$$\n",
    "- 给定随机矩阵 $\\mathbf{R} \\in \\mathbb{R}^{d \\times b/2}$，计算向量 $x $的哈希值\n",
    "\n",
    "然后一个query向量只能与其在同一个 hasing bucket 内的 key 向量 $S_i = \\{j: h(\\mathbf{q}_i) = h(\\mathbf{k}_j)\\}$ 做注意力\n",
    "<img src=\"../images/LSH-attention-matrix.png\" width=\"100%\">\n",
    "如上图所示\n",
    "(a) 通常注意力矩阵是稀疏的  \n",
    "(b) 使用局部哈希技术，将 keys 和 queries 按照其 hashing buckets 排列   \n",
    "(c) 设置 $\\mathbf{Q} = \\mathbf{K}$ 便于批处理    \n",
    "(d) 分批处理\n",
    "<img src=\"../images/LSH-attention.png\" width=\"80%\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### reversible residual layers\n",
    "\n",
    "- 通常的残差层：$y = x + F(x)$\n",
    "- reversible layer 将输入和输出各自拆分成对：$y_1 = x_1 + F(x_2),\\; y_2 = x_2 + G(y_1)$\n",
    "- reformer ：\n",
    "$$Y_1 = X_1 + \\text{Attention}(X_2), \\; Y_2 = X_2 + \\text{FeedForward}(Y_1)$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 递归Transformer（Universal Transformer）\n",
    "原始链接：https://arxiv.org/abs/1807.03819"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../images/universal-transformer.png\" width=\"100%\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
